In this lab, You will learn functions of manipulating vector and raster data, especially learning how to run various topological operations. In addition, you will also learn the tmap package, an excellent and powerful mapping package that builds on the ggplot2 grammar of graphics functionality, but is especially catered to making powerful and beautiful maps. While the lab only introduces you to some concepts, you should read Chapter 4 of the book *** Geomputation in R*** at https://geocompr.robinlovelace.net/spatial-operations.html for more details.
As we did for earlier labs, create a project in a new directory called lab4 or create a directory called lab5 and create the project in an existing directory. If you dont remember, you can get a refresher on projects. Section 8.4 has the instructions on how to create a new project.
Before you start working in RMarkdown, you need to make sure you have the necessary R packages installed:
Run the R code in your console to install the "raster" and "tmap" packages.** The other packages should be installed but if you dont have them you will need to install the "sf" and "tidyverse" packages too.
# This code will install required packages if they are not already installed
# ALWAYS INSTALL YOUR PACKAGES LIKE THIS!
packageList <- c("spdep")
for(p in packageList){
if (!requireNamespace(p)) {
install.packages(p)
}
}
Now Load the following packages:
library(tidyverse)
library(RColorBrewer)
library(sf)
library(spdep)
library(tmap)
library(units) ## install this package too if it gives an error, generally it is installed with the sf package
Download the zipped file here, and unzip all contents into your project directory. Do not have any sub_folders where you store the data. If you do, you will have to adjust your file names.
Load the neccessary datasets below
# load the data at the county level
county <- read_rds("county_dat.rds")
# Load the states sf object (mainly used for mapping)
states <- read_rds("states.rds")
Rds files One way to store objects in R is to use one of R’s native formats. One way to do that is to store your objects you create in R in the .rds extension. This creates a concise, memory efficient object that is easily readable in R. It is especially useful to store .rds objects when you are reading in large files that take a long time loading in R. In this case, as you can see although we dont have a large dataset, we load the county and state objects into r by reading their respective .rds files. This makes sure that all of you have the same sf object to working with regardless of your package version and different r studio versions.
We can use tmap to draw our exploratory maps and visually inspect patterns
# homicide rates are stored as HR60, HR70, HR80, HR90 corresponding to each decade
# Unemployment rates are stored as UE60, UE70, UE80, UE90
hr_map <- tm_shape(county) + tm_polygons(col = "HR90", palette = "-RdYlBu", n= 5, style = "quantile") + tm_shape(states) + tm_borders(col = "black")
hr_map
You can change the color palettes , number of classes, and the binning classifications(style) to map your patterns as accurately as possible:
Here are 2 simple effective readups to decide how to better classify your map values:
To calculate spatial autocorrelation measures such as global or local Moran’s I,we need to conduct the following steps: 1. Convert our sf object to an sp object (This will likely not be needed in the future as the sf package gets more developed) 2. Need to define a neighborhood for each observation based on a definition 3. Define a spatial weights matrix (W) that defines in what way and how neighborhood influence be measured 4. Run the spatial autocorrelation statistic along with a corresponding statistical test to test whether patterns are statistically significant.
sf object to sp# we wont go into specifics - but you can convert sf objects into a different spatial class (sp) using the the command below. This is needed to run spatial autocorrelation measures. Make sure to clean any data you need to before you run this step since you wont have to then worry about other specifics of the `sp` package.
## convert to an SP object
county_sp <- as(county,"Spatial")
## similarly you can convert polygons to centroids (points) by using the 1 line command below. We do this because we need to reduce polygons to points in order to calculate distance based neighborhoods
## convert to centroid for graphs and nearest neighbor purposes
cnty_coords <- coordinates(county_sp)
As discussed in class, there are multiple ways in which we can define neighbors to look at spatial autocorrelation.
For this example, we define neighbors through a Queen contiguity based 1st order neighborhood definition.
In this case, a neighbor is defined as any polygon (area) that touches the observation of interest. Even a shared vertex is enough to define a neighbor. 1st order refers to the immediate nearest neighbors. As we will see below, you can also create 2nd order neighbors, which are your neighbors of neighbors, and so on.
The diagram below illustrates the concept:
# Calculate queens 1st order neighbors
## construct neighbors list
q1 <- poly2nb(county_sp, queen=TRUE) # FALSE will provide rook contiguity
# you can also get a summary that shows you average number of links, the distribution of neighbors (link number distribution) in a clunky format.
summary(q1)
Neighbour list object:
Number of regions: 3085
Number of nonzero links: 18168
Percentage nonzero weights: 0.190896
Average number of links: 5.889141
Link number distribution:
1 2 3 4 5 6 7 8 9 10 11 13 14
24 36 91 281 620 1037 704 227 50 11 2 1 1
24 least connected regions:
45 49 585 643 837 1197 1377 1402 1442 1464 1470 1523 1531 1532 1543 1596 1605 1653 1737 1767 1775 2892 2893 2919 with 1 link
1 most connected region:
1371 with 14 links
# We can also visualize the distribution of neighbors to get a better idea
# get cardinality - or the number of neighbors for each observation
## This will help us gauge whether our observations have a reasonable distribution of neighbors for calculation. Ideally, we want the distribution to symmetric (close to normal) with a limited range. We want to flag any bimodal distributions, where some observations have few neighbors and some have many neighbors. Generally, we also tend to remove areas with no neighbors, and consider results for areas around edges with a grain of salt, since they artificially have smaller number of neighbors.
### For example, counties on the borders of North Carolina will have fewer neighbors compared to those in the middle. However, for phenomenon such as disease or poverty, the administrative boundaries do not prevent influence of bordering counties of neighboring states.
card(q1) %>% tibble(n = .) %>% ggplot(aes(x = n)) +
geom_histogram(col="white") + theme_minimal()
## We can also plot these connections to see how each observation is linked to others. For such an area this may not tell us much, but can be useful for smaller areas.
plot(county$geometry, lwd = 0.2)
plot(q1, cnty_coords, col='red', lwd=0.2, cex = 0.5, add = TRUE)
Now that we have defined our neighbors, we need to tell R how to assign weights to our neighbors. Mainly, we have to make the decision of how influential should our neighbors be? Should all neighbors be equally influential? Or should some neighbors have more weightage than others?
There are 2 other choices to make for the style and the zero.policy argument:
Style refers to how the spatial weights are performed for each neighbor. There are many options but in general, we will use 2 options: “B”* for binary weights, and “W”** for user-row standardized weights. Generally, and by default, we use user-row standardized weights, unless, we are performing inverse distance weighting or if you observations are binary (0-1).
Zero.policy deals with empty neighborhood sets. If set to FALSE, it will throw an error if there are neighbors with zero values. If TRUE, it will set weights of zero length for regions without neighbors. Hence, this will not give an error, but will also, not give use useful values, since lag values will be zero, which might not make much sense. Hence, we night want to remove empty neighbors before we conduct our autocorrelation.
q1w <- nb2listw(q1, style = "W", zero.policy = FALSE)
We input the value as a vector (homicides in 1990), and the spatial weights. We also set randomisation = TRUE to calculate a significance test. adjust.n = T is to adjust for no-neighbor values, which is 0 in our case. nsim tells you the number of simulations for the significance test. The more the better, but they are computationally more expensive too.
mor_q1<- moran.mc(x = county_sp$HR90, listw = q1w, zero.policy = FALSE, nsim = 5000, adjust.n = T)
We see that the output mor_q1 is a list that contains multiple pieces of information. You can view the overall result typing
mor_q1 # or print(mor_q1)
Monte-Carlo simulation of Moran I
data: county_sp$HR90
weights: q1w
number of simulations + 1: 5001
statistic = 0.38331, observed rank = 5001, p-value = 2e-04
alternative hypothesis: greater
However, each important component can be retrieved using the $ operator such as
mor_q1$statistic # gives value of observed morna's I
statistic
0.3833136
mor_q1$p.value # give the p value
[1] 0.00019996
You can define meighborhoods in many different ways, and calculate Moran’s I using the definition. In this lab, I show you ways to define neighbors in different ways. You can then run the moran’s I with different spatial weights
r1 <- poly2nb(county_sp, queen = FALSE)
card(r1) %>% tibble(n = .) %>% ggplot(aes(x = n)) +
geom_histogram(col="white") + theme_minimal()
plot(county$geometry, lwd = 0.2)
plot(r1, cnty_coords, col='red', lwd=0.2, cex = 0.5, add = TRUE)
r1w <- nb2listw(r1, style = "W", zero.policy = FALSE)
# calculates both 1st and 2nd order lags
q2lags <- nblag(q1, 2)
# retrieve 2nd order lags only
q2 <- q2lags[[2]]
card(q2) %>% tibble(n = .) %>% ggplot(aes(x = n)) +
geom_histogram(col="white") + theme_minimal()
plot(county$geometry, lwd = 0.2)
plot(q2, cnty_coords, col='red', lwd=0.2, cex = 0.5, add = TRUE)